R ile analize başlarken1

Derleyen Serdar Balcı, MD, Pathologist

2019-09-24

knitr::opts_chunk$set(
    message = FALSE,
    warning = FALSE,
    comment = NA,
    include = FALSE,
    tidy = TRUE
)

R nerede kullanılır

  • Veri düzenleme
  • İstatistik analiz
  • Web sayfası hazırlama (Statik/Dinamik)
  • Sunum hazırlama
  • Programlama
  • Otomatik, periodik ve tekrarlanabilir rapor hazırlama
  • pdf, html, ppt oluşturma
  • tez yazma
  • kitap yazma
  • CV oluşturma
  • poster hazırlama
  • rapor şablonu oluşturma

R generation

R yıllar içinde çok fazla değişim gösterdi

https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01169.x

:scale 30%

R yükleme

http://www.youtube.com/watch?v=XcBLEVknqvY

What is R?

R-project

https://cran.r-project.org/

RStudio

RStudio

https://www.rstudio.com/

https://www.rstudio.com/products/rstudio/download/

https://moderndive.com/2-getting-started.html

RStudio eklentileri

  • Discover and install useful RStudio addins

https://cran.r-project.org/web/packages/addinslist/README.html

https://rstudio.github.io/rstudioaddins/

devtools::install_github("rstudio/addinexamples", type = "source")

MacOS için

X11

https://www.xquartz.org/

Java OS

https://support.apple.com/kb/dl1572

R zor şeyler için kolay, kolay şeyler için zor

R Syntax Comparison::CHEAT SHEET

https://www.amelia.mn/Syntax-cheatsheet.pdf

R paketleri

Neden paketler var

Paketleri nereden bulabiliriz

Kendi paket evrenini oluştur

R paket yükleme

install.packages("tidyverse", dependencies = TRUE)
install.packages("jmv", dependencies = TRUE)
install.packages("questionr", dependencies = TRUE)
install.packages("Rcmdr", dependencies = TRUE)
install.packages("summarytools")

Paket çağırma

# require(tidyverse) require(jmv) require(questionr) library(summarytools)
# library(gganimate)

R için yardım bulma

  • Vignette

:scale 80%

https://stackoverflow.com/

  • Google uygun anahtar kelime

  • Google’da ararken [R] yazmak da işe yarayabiliyor.
  • searcher package 📦

http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf

https://www.rstudio.com/resources/cheatsheets/

  • Awesome R

https://github.com/qinwf/awesome-R#readme

https://awesome-r.com/

  • Twitter

https://twitter.com/hashtag/rstats?src=hash

  • Use Reproducible Examples When Asking

R studio ile proje oluşturma

https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects

RStudio ile veri yükleme

https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio

Excel

SPSS

CSV

Veriyi görüntüleme

Veriyi görüntüleme

View(data)
data
head
tail
glimpse
str
skimr::skim()

Veriyi değiştirme

Veriyi kod ile değiştirelim

Veriyi eklentilerle değiştirme

:scale 50%

RStudio aracılığıyla recode

questionr paketi kullanılacak

:scale 50%

https://juba.github.io/questionr/articles/recoding_addins.html

Basit tanımlayıcı istatistikler

summary()
mean
median
min
max
sd
table()
library(readr)
irisdata <- read_csv("data/iris.csv")

jmv::descriptives(data = irisdata, vars = "Sepal.Length", splitBy = "Species", 
    freq = TRUE, hist = TRUE, dens = TRUE, bar = TRUE, box = TRUE, violin = TRUE, 
    dot = TRUE, mode = TRUE, sum = TRUE, sd = TRUE, variance = TRUE, range = TRUE, 
    se = TRUE, skew = TRUE, kurt = TRUE, quart = TRUE, pcEqGr = TRUE)

 DESCRIPTIVES

 Descriptives                                          
 ───────────────────────────────────────────────────── 
                          Species       Sepal.Length   
 ───────────────────────────────────────────────────── 
   N                      setosa                  50   
                          versicolor              50   
                          virginica               50   
   Missing                setosa                   0   
                          versicolor               0   
                          virginica                0   
   Mean                   setosa                5.01   
                          versicolor            5.94   
                          virginica             6.59   
   Std. error mean        setosa              0.0498   
                          versicolor          0.0730   
                          virginica           0.0899   
   Median                 setosa                5.00   
                          versicolor            5.90   
                          virginica             6.50   
   Mode                   setosa                5.00   
                          versicolor            5.50   
                          virginica             6.30   
   Sum                    setosa                 250   
                          versicolor             297   
                          virginica              329   
   Standard deviation     setosa               0.352   
                          versicolor           0.516   
                          virginica            0.636   
   Variance               setosa               0.124   
                          versicolor           0.266   
                          virginica            0.404   
   Range                  setosa                1.50   
                          versicolor            2.10   
                          virginica             3.00   
   Minimum                setosa                4.30   
                          versicolor            4.90   
                          virginica             4.90   
   Maximum                setosa                5.80   
                          versicolor            7.00   
                          virginica             7.90   
   Skewness               setosa               0.120   
                          versicolor           0.105   
                          virginica            0.118   
   Std. error skewness    setosa               0.337   
                          versicolor           0.337   
                          virginica            0.337   
   Kurtosis               setosa              -0.253   
                          versicolor          -0.533   
                          virginica           0.0329   
   Std. error kurtosis    setosa               0.662   
                          versicolor           0.662   
                          virginica            0.662   
   25th percentile        setosa                4.80   
                          versicolor            5.60   
                          virginica             6.23   
   50th percentile        setosa                5.00   
                          versicolor            5.90   
                          virginica             6.50   
   75th percentile        setosa                5.20   
                          versicolor            6.30   
                          virginica             6.90   
 ───────────────────────────────────────────────────── 

# install.packages('scatr')

scatr::scat(data = irisdata, x = "Sepal.Length", y = "Sepal.Width", group = "Species", 
    marg = "dens", line = "linear", se = TRUE)

summarytools

https://cran.r-project.org/web/packages/summarytools/vignettes/Introduction.html

library(summarytools)
summarytools::freq(iris$Species, style = "rmarkdown")

Frequencies

iris$Species

Type: Factor

  Freq % Valid % Valid Cum. % Total % Total Cum.
setosa 50 33.33 33.33 33.33 33.33
versicolor 50 33.33 66.67 33.33 66.67
virginica 50 33.33 100.00 33.33 100.00
<NA> 0 0.00 100.00
Total 150 100.00 100.00 100.00 100.00
summarytools::freq(iris$Species, report.nas = FALSE, style = "rmarkdown", headings = FALSE)
  Freq % % Cum.
setosa 50 33.33 33.33
versicolor 50 33.33 66.67
virginica 50 33.33 100.00
Total 150 100.00 100.00
with(tobacco, print(ctable(smoker, diseased), method = "render"))

Cross-Tabulation, Row Proportions

smoker * diseased

Data Frame: tobacco
diseased
smoker Yes No Total
Yes 125 ( 41.9% ) 173 ( 58.1% ) 298 ( 100.0% )
No 99 ( 14.1% ) 603 ( 85.9% ) 702 ( 100.0% )
Total 224 ( 22.4% ) 776 ( 77.6% ) 1000 ( 100.0% )

Generated by summarytools 0.9.4 (R version 3.6.0)
2019-09-24

with(tobacco, print(ctable(smoker, diseased, prop = "n", totals = FALSE), omit.headings = TRUE, 
    method = "render"))

Cross-Tabulation

smoker * diseased

Data Frame: tobacco
diseased
smoker Yes No
Yes 125 173
No 99 603

Generated by summarytools 0.9.4 (R version 3.6.0)
2019-09-24

summarytools::descr(iris, style = "rmarkdown")

Descriptive Statistics

iris

N: 150

  Petal.Length Petal.Width Sepal.Length Sepal.Width
Mean 3.76 1.20 5.84 3.06
Std.Dev 1.77 0.76 0.83 0.44
Min 1.00 0.10 4.30 2.00
Q1 1.60 0.30 5.10 2.80
Median 4.35 1.30 5.80 3.00
Q3 5.10 1.80 6.40 3.30
Max 6.90 2.50 7.90 4.40
MAD 1.85 1.04 1.04 0.44
IQR 3.50 1.50 1.30 0.50
CV 0.47 0.64 0.14 0.14
Skewness -0.27 -0.10 0.31 0.31
SE.Skewness 0.20 0.20 0.20 0.20
Kurtosis -1.42 -1.36 -0.61 0.14
N.Valid 150.00 150.00 150.00 150.00
Pct.Valid 100.00 100.00 100.00 100.00
descr(iris, stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE, 
    headings = FALSE, style = "rmarkdown")
  Mean Std.Dev Min Median Max
Petal.Length 3.76 1.77 1.00 4.35 6.90
Petal.Width 1.20 0.76 0.10 1.30 2.50
Sepal.Length 5.84 0.83 4.30 5.80 7.90
Sepal.Width 3.06 0.44 2.00 3.00 4.40
# view(dfSummary(iris))

dfSummary(tobacco, plain.ascii = FALSE, style = "grid")

Data Frame Summary

tobacco

Dimensions: 1000 x 9
Duplicates: 2

No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 gender
[factor]
1. F
2. M
489 (50.0%)
489 (50.0%)
IIIIIIIIII
IIIIIIIIII
978
(97.8%)
22
(2.2%)
2 age
[numeric]
Mean (sd) : 49.6 (18.3)
min < med < max:
18 < 50 < 80
IQR (CV) : 32 (0.4)
63 distinct values
.     .     . . . :
: : : : : . : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
975
(97.5%)
25
(2.5%)
3 age.gr
[factor]
1. 18-34
2. 35-50
3. 51-70
4. 71 +
258 (26.5%)
241 (24.7%)
317 (32.5%)
159 (16.3%)
IIIII
IIII
IIIIII
III
975
(97.5%)
25
(2.5%)
4 BMI
[numeric]
Mean (sd) : 25.7 (4.5)
min < med < max:
8.8 < 25.6 < 39.4
IQR (CV) : 5.7 (0.2)
974 distinct values
          :
        : : :
        : : :
      : : : : :
    . : : : : : .
974
(97.4%)
26
(2.6%)
5 smoker
[factor]
1. Yes
2. No
298 (29.8%)
702 (70.2%)
IIIII
IIIIIIIIIIIIII
1000
(100%)
0
(0%)
6 cigs.per.day
[numeric]
Mean (sd) : 6.8 (11.9)
min < med < max:
0 < 0 < 40
IQR (CV) : 11 (1.8)
37 distinct values
:
:
:
:
:   . . . . . .
965
(96.5%)
35
(3.5%)
7 diseased
[factor]
1. Yes
2. No
224 (22.4%)
776 (77.6%)
IIII
IIIIIIIIIIIIIII
1000
(100%)
0
(0%)
8 disease
[character]
1. Hypertension
2. Cancer
3. Cholesterol
4. Heart
5. Pulmonary
6. Musculoskeletal
7. Diabetes
8. Hearing
9. Digestive
10. Hypotension
[ 3 others ]
36 (16.2%)
34 (15.3%)
21 ( 9.5%)
20 ( 9.0%)
20 ( 9.0%)
19 ( 8.6%)
14 ( 6.3%)
14 ( 6.3%)
12 ( 5.4%)
11 ( 5.0%)
21 ( 9.5%)
III
III
I
I
I
I
I
I
I

I
222
(22.2%)
778
(77.8%)
9 samp.wgts
[numeric]
Mean (sd) : 1 (0.1)
min < med < max:
0.9 < 1 < 1.1
IQR (CV) : 0.2 (0.1)
0.86!: 267 (26.7%)
1.04!: 249 (24.9%)
1.05!: 324 (32.4%)
1.06!: 160 (16.0%)
! rounded
IIIII
IIII
IIIIII
III

1000
(100%)
0
(0%)
# First save the results

iris_stats_by_species <- by(data = iris, INDICES = iris$Species, FUN = descr, 
    stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)

# Then use view(), like so:

view(iris_stats_by_species, method = "pander", style = "rmarkdown")

Descriptive Statistics

iris

Group: Species = setosa
N: 50

  Mean Std.Dev Min Median Max
Petal.Length 1.46 0.17 1.00 1.50 1.90
Petal.Width 0.25 0.11 0.10 0.20 0.60
Sepal.Length 5.01 0.35 4.30 5.00 5.80
Sepal.Width 3.43 0.38 2.30 3.40 4.40

Group: Species = versicolor
N: 50

  Mean Std.Dev Min Median Max
Petal.Length 4.26 0.47 3.00 4.35 5.10
Petal.Width 1.33 0.20 1.00 1.30 1.80
Sepal.Length 5.94 0.52 4.90 5.90 7.00
Sepal.Width 2.77 0.31 2.00 2.80 3.40

Group: Species = virginica
N: 50

  Mean Std.Dev Min Median Max
Petal.Length 5.55 0.55 4.50 5.55 6.90
Petal.Width 2.03 0.27 1.40 2.00 2.50
Sepal.Length 6.59 0.64 4.90 6.50 7.90
Sepal.Width 2.97 0.32 2.20 3.00 3.80
# view(iris_stats_by_species)

data(tobacco)  # tobacco is an example dataframe included in the package
BMI_by_age <- with(tobacco, by(BMI, age.gr, descr, stats = c("mean", "sd", "min", 
    "med", "max")))
view(BMI_by_age, "pander", style = "rmarkdown")

Descriptive Statistics

BMI by age.gr

Data Frame: tobacco
N: 258

  18-34 35-50 51-70 71 +
Mean 23.84 25.11 26.91 27.45
Std.Dev 4.23 4.34 4.26 4.37
Min 8.83 10.35 9.01 16.36
Median 24.04 25.11 26.77 27.52
Max 34.84 39.44 39.21 38.37
BMI_by_age <- with(tobacco, by(BMI, age.gr, descr, transpose = TRUE, stats = c("mean", 
    "sd", "min", "med", "max")))

view(BMI_by_age, "pander", style = "rmarkdown", omit.headings = TRUE)

Descriptive Statistics

BMI by age.gr

Data Frame: tobacco
N: 258

  Mean Std.Dev Min Median Max
18-34 23.84 4.23 8.83 24.04 34.84
35-50 25.11 4.34 10.35 25.11 39.44
51-70 26.91 4.26 9.01 26.77 39.21
71 + 27.45 4.37 16.36 27.52 38.37
tobacco_subset <- tobacco[, c("gender", "age.gr", "smoker")]
freq_tables <- lapply(tobacco_subset, freq)

# view(freq_tables, footnote = NA, file = 'freq-tables.html')
what.is(iris)

$properties property value 1 class data.frame 2 typeof list 3 mode list 4 storage.mode list 5 dim 150 x 5 6 length 5 7 is.object TRUE 8 object.type S3 9 object.size 7256 Bytes

$attributes.lengths names class row.names 5 1 150

$extensive.is [1] “is.data.frame” “is.list” “is.object” “is.recursive” [5] “is.unsorted”

skimr

library(skimr)
skim(df)

DataExplorer

library(DataExplorer)
DataExplorer::create_report(df)

inspectdf

https://github.com/alastairrushworth/inspectdf

Grafikler

descr(tobacco, style = 'rmarkdown')

print(descr(tobacco), method = 'render', table.classes = 'st-small')

dfSummary(tobacco, style = 'grid', plain.ascii = FALSE)

print(dfSummary(tobacco, graph.magnif = 0.75), method = 'render')

Bazı arayüzler

Rcmdr

library(Rcmdr)

Rcmdr::Commander()
  • A Comparative Review of the R Commander GUI for R

http://r4stats.com/articles/software-reviews/r-commander/

jamovi

https://www.jamovi.org/

https://blog.jamovi.org/2018/07/30/rj.html

R nereden öğrenilir

https://sbalci.github.io/MyRCodesForDataAnalysis/WhereToLearnR.nb.html

Sonraki Konular

  • RStudio ile GitHub kullanımı
  • R Markdown ve R Notebook ile tekrarlanabilir rapor
  • Hipotez testleri

Geri Bildirim

# Save Final Data

saved data after analysis to `Data-After-Analysis.xlsx`.

saveRDS(mydata, "Data-After-Analysis.rds")

writexl::write_xlsx(mydata, "Data-After-Analysis.xlsx")

file.info("Data-After-Analysis.xlsx")$ctime

Libraries Used

citation("tidyverse")
citation("foreign")
citation("tidylog")
citation("janitor")
citation("jmv")
citation("tangram")
citation("finalfit")
citation("summarytools")
citation("ggstatplot")
citation("readxl")

Notes

Completed on 2019-09-24 19:34:31.

Serdar Balci, MD, Pathologist

https://rpubs.com/sbalci/CV
https://sbalci.github.io/
https://github.com/sbalci


  1. Bu bir derlemedir, mümkün mertebe alıntılara referans vermeye çalıştım.↩︎